摘要 :
Language models are increasingly becoming popular in Al-powered scientific IR systems. This paper evaluates popular scientific language models in handling (ⅰ) short-query texts and (ⅱ) textual neighbors. Our experiments showcase...
展开
Language models are increasingly becoming popular in Al-powered scientific IR systems. This paper evaluates popular scientific language models in handling (ⅰ) short-query texts and (ⅱ) textual neighbors. Our experiments showcase the inability to retrieve relevant documents for a short-query text even under the most relaxed conditions. Additionally, we leverage textual neighbors, generated by small perturbations to the original text, to demonstrate that not all perturbations lead to close neighbors in the embedding space. Further, an exhaustive categorization yields several classes of orthographically and semantically related, partially related and completely unrelated neighbors. Retrieval performance turns out to be more influenced by the surface form rather than the semantics of the text.
收起
摘要 :
Interpretability is an emerging area of research in trustworthy machine learning. Safe deployment of machine learning system mandates that the prediction and its explanation be reliable and robust. Recently, it has been shown that...
展开
Interpretability is an emerging area of research in trustworthy machine learning. Safe deployment of machine learning system mandates that the prediction and its explanation be reliable and robust. Recently, it has been shown that the explanations could be manipulated easily by adding visually imperceptible perturbations to the input while keeping the model's prediction intact. In this work, we study the problem of attributional robustness (i.e. models having robust explanations) by showing an upper bound for attributional vulnerability in terms of spatial correlation between the input image and its explanation map. We propose a training methodology that learns robust features by minimizing this upper bound using soft-margin triplet loss. Our methodology of robust attribution training (ART) achieves the new state-of-the-art attributional robustness measure by a margin of ≈6-18% on several standard datasets, ie. SVHN, CIFAR-10 and GTSRB. We further show the utility of the proposed robust training technique (ART) in the downstream task of weakly supervised object localization by achieving the new state-of-the-art performance on CUB-200 dataset.
收起
摘要 :
Interpretability is an emerging area of research in trustworthy machine learning. Safe deployment of machine learning system mandates that the prediction and its explanation be reliable and robust. Recently, it has been shown that...
展开
Interpretability is an emerging area of research in trustworthy machine learning. Safe deployment of machine learning system mandates that the prediction and its explanation be reliable and robust. Recently, it has been shown that the explanations could be manipulated easily by adding visually imperceptible perturbations to the input while keeping the model's prediction intact. In this work, we study the problem of attributional robustness (i.e. models having robust explanations) by showing an upper bound for attributional vulnerability in terms of spatial correlation between the input image and its explanation map. We propose a training methodology that learns robust features by minimizing this upper bound using soft-margin triplet loss. Our methodology of robust attribution training (ART) achieves the new state-of-the-art attributional robustness measure by a margin of ≈6-18% on several standard datasets, ie. SVHN, CIFAR-10 and GTSRB. We further show the utility of the proposed robust training technique (ART) in the downstream task of weakly supervised object localization by achieving the new state-of-the-art performance on CUB-200 dataset.
收起
摘要 :
Information Extraction (IE) from the tables present in scientific articles is challenging due to complicated tabular representations and complex embedded text. This paper presents TabLeX, a large-scale benchmark dataset comprising...
展开
Information Extraction (IE) from the tables present in scientific articles is challenging due to complicated tabular representations and complex embedded text. This paper presents TabLeX, a large-scale benchmark dataset comprising table images generated from scientific articles. TabLeX consists of two subsets, one for table structure extraction and the other for table content extraction. Each table image is accompanied by its corresponding L~AT_EX source code. To facilitate the development of robust table IE tools, TABLEX contains images in different aspect ratios and in a variety of fonts. Our analysis sheds light on the shortcomings of current state-of-the-art table extraction models and shows that they fail on even simple table images. Towards the end, we experiment with a transformer-based existing baseline to report performance scores. In contrast to the static benchmarks, we plan to augment this dataset with more complex and diverse tables at regular intervals.
收起
摘要 :
Information Extraction (IE) from the tables present in scientific articles is challenging due to complicated tabular representations and complex embedded text. This paper presents TabLeX, a large-scale benchmark dataset comprising...
展开
Information Extraction (IE) from the tables present in scientific articles is challenging due to complicated tabular representations and complex embedded text. This paper presents TabLeX, a large-scale benchmark dataset comprising table images generated from scientific articles. TabLeX consists of two subsets, one for table structure extraction and the other for table content extraction. Each table image is accompanied by its corresponding LATEX source code. To facilitate the development of robust table IE tools, TabLeX contains images in different aspect ratios and in a variety of fonts. Our analysis sheds light on the shortcomings of current state-of-the-art table extraction models and shows that they fail on even simple table images. Towards the end, we experiment with a transformer-based existing baseline to report performance scores. In contrast to the static benchmarks, we plan to augment this dataset with more complex and diverse tables at regular intervals.
收起
摘要 :
Adhesion is a key factor in the overall efficacy and effectiveness of polyurethane insulation.In spray foam,adhesion is vital for creating and maintaining an airtight seal.The most challenging conditions for adhesion in spray-come...
展开
Adhesion is a key factor in the overall efficacy and effectiveness of polyurethane insulation.In spray foam,adhesion is vital for creating and maintaining an airtight seal.The most challenging conditions for adhesion in spray-come from cold and wet environments.The cold temperature slows down reactivity and makes wet out more difficult.Moisture on the surface promotes the urea reaction of isocyanate and water,which produces CO2 and results in surface voids at the interface.It is a common issue for spray foam to fail to meet structural and insulation needs of the client due to cold,wet surfaces.In polyisocyanurate foams,the adhesion of foam to the facer is important both for protection of the polyurethane foam and to maintain high insulation values.It is the need for improved adhesion that has driven this project,which will lead to a better understanding of the problems and potential solutions for different market segments.EP-AT-N11 is being developed as an experimental product targeted for the spray foam market,while EP-IS-1 is developed to meet the needs of the PIR (polyisocyanurate) industry.
收起
摘要 :
Adhesion is a key factor in the overall efficacy and effectiveness of polyurethane insulation. In spray foam, adhesion is vital for creating and maintaining an airtight seal. The most challenging conditions for adhesion in spray c...
展开
Adhesion is a key factor in the overall efficacy and effectiveness of polyurethane insulation. In spray foam, adhesion is vital for creating and maintaining an airtight seal. The most challenging conditions for adhesion in spray come from cold and wet environments. The cold temperature slows down reactivity and makes wet out more difficult. Moisture on the surface promotes the urea reaction of isocyanate and water, which produces CO_2 and results in surface voids at the interface. It is a common issue for spray foam to fail to meet structural and insulation needs of the client due to cold, wet surfaces. In polyisocyanurate foams, the adhesion of foam to the facer is important both for protection of the polyurethane foam and to maintain high insulation values. It is the need for improved adhesion that has driven this project, which will lead to a better understanding of the problems and potential solutions for different market segments. EP-AT-N11 is being developed as an experimental product targeted for the spray foam market, while EP-IS-1 is developed to meet the needs of the PIR (polyisocyanurate) industry.
收起
摘要 :
Digital advancement in scholarly repositories has led to the emergence of a large number of open access predatory publishers that charge high article processing fees from authors but fail to provide necessary editorial and publish...
展开
Digital advancement in scholarly repositories has led to the emergence of a large number of open access predatory publishers that charge high article processing fees from authors but fail to provide necessary editorial and publishing services. Identifying and blacklisting such publishers has remained a research challenge due to the highly volatile scholarly publishing ecosystem. This paper presents a data-driven approach to study how potential predatory publishers are evolving and bypassing several regularity constraints. We empirically show the close resemblance of predatory publishers against reputed publishing groups. In addition to verifying standard constraints, we also propose distinctive signals gathered from network-centric properties to understand this evolving ecosystem better. To facilitate reproducible research, we shall make all the codes and the processed dataset available in the public domain.
收起
摘要 :
Digital advancement in scholarly repositories has led to the emergence of a large number of open access predatory publishers that charge high article processing fees from authors but fail to provide necessary editorial and publish...
展开
Digital advancement in scholarly repositories has led to the emergence of a large number of open access predatory publishers that charge high article processing fees from authors but fail to provide necessary editorial and publishing services. Identifying and blacklisting such publishers has remained a research challenge due to the highly volatile scholarly publishing ecosystem. This paper presents a data-driven approach to study how potential predatory publishers are evolving and bypassing several regularity constraints. We empirically show the close resemblance of predatory publishers against reputed publishing groups. In addition to verifying standard constraints, we also propose distinctive signals gathered from network-centric properties to understand this evolving ecosystem better. To facilitate reproducible research, we shall make all the codes and the processed dataset available in the public domain.
收起
摘要 :
Recently, neural networks have seen a surge in their adoption due to their ability to provide high accuracy on various tasks. On the other hand, the existence of adversarial examples has raised suspicions regarding the generalizat...
展开
Recently, neural networks have seen a surge in their adoption due to their ability to provide high accuracy on various tasks. On the other hand, the existence of adversarial examples has raised suspicions regarding the generalization capabilities of neural networks. In this work, we focus on the weight matrix learned by the neural networks and hypothesize that an ill-conditioned weight matrix is one of the contributing factors in the neural network's susceptibility towards adversarial examples. For ensuring that the learned weight matrix's condition number remains sufficiently low, we suggest using an orthogonal regularizer. We show that this indeed helps in increasing the adversarial accuracy on MNIST and F-MNIST datasets.
收起